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Abstract 

Recently, a vector version of Witsenhausen's counterexample was considered and it was shown that in that 
limit of infinite vector length, certain quantization-based control strategies are provably within a constant factor 
of the optimal cost for all possible problem parameters. In this paper, finite vector lengths are considered with 
the dimension being viewed as an additional problem parameter. By applying a large-deviation "sphere-packing" 
philosophy, a lower bound to the optimal cost for the finite dimensional case is derived that uses appropriate shadows 
of the infinite-length bound. Using the new lower bound, we show that good lattice-based control strategies achieve 
within a constant factor of the optimal cost uniformly over all possible problem parameters, including the vector 
length. For Witsenhausen's original problem — the scalar case — the gap between regular lattice-based strategies 
and the lower bound is numerically never more than a factor of 8. 



Distributed control problems have long proved challenging for control engineers. In 1968, Witsen- 
hausen [1] gave a counterexample showing that even a seemingly simple distributed control problem can 
be hard to solve. For the counterexample, Witsenhausen chose a two-stage distributed LQG system and 
provided a nonlinear control strategy that outperforms all linear laws. It is now clear that the non-classical 



information pattern of Witsenhausen's problem makes it quite challenging]] the optimal strategy and the 




optimal costs for the problem are still unknown — non-convexity makes the search for an optimal strategy 

'in words of Yu-Chi Ho [2], "the simplest problem becomes the hardest problem." 



I. Introduction 
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hard [3]-[5]. Discrete approximations of the problem [6] are even NP-complete^ [7]. 

In the absence of a solution, research on the counterexample has bifurcated into two different directions. 
Since there is no known systematic approach to obtain provably optimal solutions, a body of literature 
(e.g. [4] [5] [8] and the references therein) applies search heuristics to explore the space of possible 
control actions and obtain intuition into the structure of good strategies. Work in this direction has also 
yielded considerable insight into addressing non-convex problems in general. 

In the other direction the emphasis is on understanding the role of implicit communication in the 
counterexample. In distributed control, control actions not only attempt to reduce the immediate control 
costs, they can also communicate relevant information to other controllers to help them reduce costs. 
Witsenhausen [1, Section 6] and Mitter and Sahai [9] aim at developing systematic constructions based 
on implicit communication. Witsenhausen's two-point quantization strategy is motivated from the optimal 
strategy for two-point symmetric distributions of the initial state [1, Section 5] and it outperforms linear 
strategies for certain parameter choices. Mitter and Sahai [9] propose multipoint-quantization strategies 
that, depending on the problem parameters, can outperform linear strategies by an arbitrarily-large factor. 

Various modifications to the counterexample investigate if misalignment of these two goals of control 
and implicit communication makes the problems hard [3], [10]— [14] (see [15] for a survey of other such 
modifications). Of particular interest are two works, those of Rotkowitz and Lall [12], and Rotkowitz [14]. 
The first work [12] shows that with extremely fast, infinite-capacity, and perfectly reliable external 
channels, the optimal controllers are linear not just for the Witsenhausen's counterexample (which is 
a simple observation), but for more general problems as well. This suggests that allowing for an external 
channel between the two controllers in Witsenhausen's counterexample might simplify the problem. 
However, when the channel is not perfect, Martins [16] shows that finding optimal solutions can be 

2 More precisely, results in [7] imply that the discrete counterparts to the Witsenhausen counterexample are NP-complete if the assumption 
of Gaussianity of the primitive random variables is relaxed. Further, it is also shown in [7] that with this relaxation, a polynomial time 
solution to the original continuous problem would imply P = NP, and thus conceptually the relaxed continuous problem is also hard. 
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hard^J A closer inspection of the problem in [16] reveals that nonlinear strategies can outperform linear 
ones by an arbitrarily large factor for any fixed SNR on the external channel. Even to make good use of 
the external channel resource, one needs nonlinear strategies. 

The second work [14] shows that if one considers the induced norm instead of the original expected 
quadratic cost, linear control laws are optimal and easy to find. The induced norm formulation is therefore 
easy to solve, and at the same time, it makes no assumptions on the state and the noise distributions. This 
led Doyle to ask if Witsenhausen's counterexample (with expected quadratic cost) is at all relevant [21] — 
after all, not only is the LQG formulation more constrained, it is also harder to solve. The question thus 
becomes what norm is more appropriate, and the answer must come from what is relevant in practical 
situations. In practice, one usually knows the "typical" amplitude of the noise and the initial state, or at 
least rough bounds them. The induced-norm formulation may therefore be quite conservative: since no 
assumptions are made on the state and the noise, it requires budgeting for completely arbitrary behavior of 
state and noise — they can even collude to raise the costs for the chosen strategy. To see how conservative 
the induced-norm formulation can be, notice the following: even allowing for colluding state and noise, 
mere knowledge of a bound on the noise amplitude suffices to have quantization-based nonlinear strategies 
outperform linear strategies by an arbitrarily large factor (with the expected cost replaced by a hard-budget. 
The proof is simpler than that in [9], and is left as an exercise to the interested reader for reasons of 
limited space). Conceptually, the LQG formulation is only abstracting some knowledge of noise and initial 
state behavior. In practical situations where such knowledge exists, designs based on an induced norm 
formulation (and linear strategies) may be needlessly expensive because they budget for impossible events. 

3 Martins shows that nonlinear strategies that do not even use the external channel can outperform linear ones that do use the channel where 
the external channel SNR is high. As is suggested by what David Tse calls the "deterministic perspective" (along the lines of [17]— [19]), 
linear strategies do not make good use of the external channel because they only communicate the "most significant bits" — which can 
anyway be estimated reliably at the second controller. So if the uncertainty in the initial state is large, the external channel is only of limited 
help and there may be substantial advantage in having the controllers talk through the plant. A similar problem is considered by Shoarinejad 
et al in [20], where noisy side information of the source is available at the receiver. Since this formulation is even more constrained than 
that in [16], it is clear that nonlinear strategies outperform linear for this problem as well. 



The fact that nonlinear strategies can be arbitrarily better brings us to a question that has received 
little attention in the literature — how far are the proposed nonlinear strategies from the optimal? It is 



believed that the strategies of Lee, Lau and Ho [5] are close to optimal. In Section VI we will see that 
these strategies can be viewed as an instance of the "dirty-paper coding" strategy in information theory, 
and quantify their advantage over pure quantization based strategies. Despite their improved performance, 
there was no guarantee that these strategies are indeed close to optima]^] Witsenhausen [1, Section 7] 
derived a lower bound on the costs that is loose in the interesting regimes of small k and large Gq [15], 
[22], and hence is insufficient to obtain any guarantee on the gap from optimality. 

Towards obtaining such a guarantee, a strategic simplification of the problem was introduced in [15], 
[23] where we consider an asymptotically-long vector version of the problem. This problem is related to a 
toy communication problem that we call "Assisted Interference Suppression" (AIS) which is an extension 
of the dirty-paper coding (DPC) [24] model in information theory. There has been a burst of interest 
in extensions to DPC in information theory mainly along two lines of work — multi-antenna Gaussian 
channels, and the "cognitive-radio channel." For multi-antenna Gaussian channels, a problem of much 
theoretical and practical interest, DPC turns out to be the optimal strategy (see [25] and the references 
therein). The "cognitive radio channel" problem was formulated by Devroye et al [26]. This inspired 
much work in asymmetric cooperation between nodes [27]— [31]. In our work [15], [23], we developed 
a new lower bound to the optimal performance of the vector Witsenhausen problem. Using this bound, 
we show that vector-quantization based strategies attain within a factor of 4.45 of the optimal cost for all 
problem parameters in the limit of infinite vector length. Further, combinations of linear and DPC-based 
strategies attain within a factor 2 of the optimal cost. This factor was later improved to 1.3 in [32] by 
improving the lower bound. While a constant-factor result does not establish true optimality, such results 
are often helpful in the face of intractable problems like those that are otherwise NP-hard [33]. This 

4 The search in [5] is not exhaustive. The authors first find a good quantization-based solution. Inspired by piecewise linear strategies 
(from the neural networks based search of Baglietto et al [4]), each quantization step is broken into several small sub-steps to approximate 
a piecewise linear curve. 



constant-factor spirit has also been useful in understanding other stochastic control problems [34], [35] 
and in the asymptotic analysis of problems in multiuser wireless communication [17], [36]. 

While the lower bound in [15] holds for all vector lengths, and hence for the scalar counterexample as 
well, the ratio of the costs attained by the strategies of [9] and the lower bound diverges in the limit k — > 
and do — > oo. This suggests that there is a significant finite-dimensional aspect of the problem that is 
being lost in the infinite-dimensional limit: either quantization-based strategies are bad, or the lower bound 
of [15] is very loose. This effect is elucidated in [22] by deriving a different lower bound showing that 
quantization-based strategies indeed attain within a constant factor of the optimal cost for Witsenhausen's 
original problem. The bound in [22] is in the spirit of Witsenhausen's original lower bound, but is more 
intricate. It captures the idea that observation noise can force a second-stage cost to be incurred unless 
the first stage cost is large. 

In this paper, we revert to the line of attack initiated by the vector simplification of [15]. In Section |ll} 
we formally state the vector version of the counterexample. For obtaining good control strategies, we 
observe that the action of the first controller in the quantization-based strategy of [9] can be thought of as 



forcing the state to a point on a one-dimensional lattice. Extending this idea, in Section III we provide 



lattice-based quantization strategies for finite dimensional spaces and analyze their performance. 



Building upon the vector lower bound of [15], a new lower bound is derived in Section [TV which is in the 
spirit of large-deviations-based information-theoretic bounds for finite-length communication problems^] 
(e.g. [40]-[43]). In particular, our new bound extends the tools in [43] to a setting with unbounded 



distortion measure. In Section [VJ we combine the lattice-based upper bound (Section III) and the large 



deviations lower bound (Section IV) to show that lattice-based quantization strategies attain within a 
constant factor of the optimal cost for any finite length, uniformly over all problem parameters. For 
example, this constant factor is numerically found to be smaller than 8 for the original scalar problem. 

5 The constant is large in [22], but as this paper shows, this is an artifact of the proof rather than reality. 

6 An alternative Central Limit Theorem (CLT)-based approach has also been used in the information-theory literature [37]-[39]. In [38], 
[39], the approach is used to obtain extremely tight approximations at moderate blocklengths for Shannon's noisy communication problem. 



We also provide a constant factor that holds uniformly over all vector lengths. 

To understand the significance of the result, consider the following. At k = 0.01 and do = 500, the cost 
attained by the optimal linear scheme is close to 1. The cost attained by a quantization-based^] scheme is 
8.894 x 10~ 4 . Our new lower bound on the cost is 3.170 x 10~ 4 . Despite the small value of the lower 
bound, the ratio of the quantization-based upper bound and the lower bound for this choice of parameters 
is less than three! 



We conclude in Section VI outlining directions of future research and speculating on the form of 
finite-dimensional strategies (following [15]) that we conjecture might be optimal. 



II. Notation and problem statement 
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Fig. 1. Block-diagram for vector version of Witsenhausen's counterexample of length m. 



Vectors are denoted in bold. Upper case tends to be used for random variables, while lower case symbols 
represent their realizations. W(m, k 2 , <Tq) denotes the vector version of Witsenhausen's problem of length 
m, defined as follows (shown in Fig. [T}: 

• The initial state X™ is Gaussian, distributed A/"(0, o^Im), where l m is the identity matrix of size 
m x m. 

• The state transition functions describe the state evolution with time. The state transitions are linear: 

X™ = X™ + U™, and 

X m = X^-U™ 

7 The quantization points are regularly spaced about 9.92 units apart. This results in a first stage cost of about 8.2 x 10~ 4 and a second 
stage cost of about 6.7 x 10~ 5 . 
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• The outputs observed by the controllers: 

Y™ = X™, and 

Y m = X ™ + Z m , (1) 

where Z m ~ jV(0, a 2 z l m ) is Gaussian distributed observation noise. 

• The control objective is to minimize the expected cost, averaged over the random realizations of X™ 
and Z m . The total cost is a quadratic function of the state and the input given by the sum of two 
terms: 

Ji(x™iC) = -A; 2 |K|| 2 , and 
m 

J 2 (x™<) = ^IKH 2 

where || ■ || denotes the usual Euclidean 2-norm. The cost expressions are normalized by the vector- 
length m to allow for natural comparisons between different vector-lengths. A control strategy is 
denoted by 7 = (71,72), where 7$ is the function that maps the observation y™ at Q to the control 
input u™. For a fixed 7, x™ = x™ + 7 1 (xq 1 ) is a function of x™. Thus the first stage cost can instead 
be written as a function jf^x™) = Ji(x™ + 71 (x™), 71 (x™)) and the second stage cost can be 
written as J^x™ z m ) = J 2 (x™ + 71 (K) - 72 (xj 1 + 71 (xg») + z m ), 72 W + 71 (K) + z m ))- 
For given 7, the expected costs (averaged over x™ and z m ) are denoted by J^\m, k 2 , <r 2 ) and 
J^\m, k 2 , (Tq) for i — 1,2. We define J^in( m > °o) as follows 

J min (m, k 2 , al) : = inf J W (m, k 2 , a 2 ). (2) 

7 

We note that for the scalar case of m — 1, the problem is Witsenhau sen's original counterexample [1]. 

Observe that scaling a and a z by the same factor essentially does not change the problem — the 
solution can also be scaled by the same factor (with the resulting cost scaling quadratically with it). Thus, 
without loss of generality, we assume that the variance of the Gaussian observation noise is u 2 z = 1 (as 
is also assumed in [1]). The pdf of the noise Z m is denoted by /z(-). In our proof techniques, we also 
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Fig. 2. Covering and packing for the 2-dimensional hexagonal lattice. The packing-covering ratio for this lattice is f = -j= w 1.15 [44, 
Appendix C]. The first controller forces the initial state x™ to the lattice point nearest to it. The second controller estimates x™ to be a 
lattice point at the centre of the sphere if it falls in one of the packing spheres. Else it essentially gives up and estimates x™ = y™, the 
received output itself. A hexagonal lattice-based scheme would perform better for the 2-D Witsenhausen problem than the square lattice (of 
£ = v2 ~ 1.41 [44, Appendix C]) because it has a smaller £. 

consider a hypothetical observation noise TIq ~ A/"(0, (Jq) with the variance Oq > 1. The pdf of this test 
noise is denoted by /g( - )- We use ^(m,r) to denote Pr(||Z m || > r) for Z m ~ A/"(0,I). 

Subscripts in expectation expressions denote the random variable being averaged over (e.g. E x ™,z™ [•] 
denotes averaging over the initial state X™ and the test noise Z^). 

III. LATTICE-BASED QUANTIZATION STRATEGIES 

Lattice-based quantization strategies are the natural generalizations of scalar quantization-based strate- 
gies [9]. An introduction to lattices can be found in [45], [46]. Relevant definitions are reviewed below. 
B denotes the unit ball in M m . 

Definition 1 (Lattice): An m-dimensional lattice A is a set of points in IR m such that if x m ,y m G A, 
then x m + y m e A, and if x m e A, then -x m e A. 
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Definition 2 (Packing and packing radius): Given an m-dimensional lattice A and a radius r, the 
set A + rB is a packing of Euclidean m-space if for all points x m , y m e A, (x m + rB) f](y m + r &) — 0- 
The packing radius r p is defined as r p := sup{r : A + is a packing}. 

Definition 3 (Covering and covering radius): Given an m-dimensional lattice A and a radius r, the 
set A + rB is a covering of Euclidean m-space if IR m C A + rB. The covering radius r c is defined as 
r c := inf{r : A + rB is a covering}. 

Definition 4 (Packing-covering ratio): The packing-covering ratio (denoted by of a lattice A is the 
ratio of its covering radius to its packing radius, £ = ^. 

Because it creates no ambiguity, we do not include the dimension m and the choice of lattice A in the 
notation of r c , r p and £, though these quantities depend on m and A. 

For a given dimension m, a natural control strategy that uses a lattice A of covering radius r c and 
packing radius r p is as follows. The first controller uses the input u™ to force the state x™ to the lattice 
point nearest to x™. The second controller estimates x™ to be the lattice point nearest to y™. For analytical 
ease, we instead consider an inferior strategy where the second controller estimates x™ to be a lattice 
point only if the lattice point lies within the sphere of radius r p around y™. If no lattice point exists in 
the sphere, the second controller estimates x™ to be y™, the received vector itself. The actions 7 1 (-) of 
Ci and 7 2 (-) of C_2 are therefore given by 



The event where there exists no such x™ e A is referred to as decoding failure. In the following, we 
denote 72 (y™) by x™, the estimate of x™. 

Theorem 1: Using a lattice-based strategy (as described above) for W(m, k 2 , <7q) with r c and r p the 
covering and the packing radius for the lattice, the total average cost is upper bounded by 



71 CO = -x™ + arg min ||x? 



jn 



.m\\2 
-0 II ) 



xf eA 




J^\m, k 2 , a 2 ) < inf k 2 P + f ^ip{m + 2,r p ) + 
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where £ = ^ is the packing-covering ratio for the lattice, and ip(m,r) = Pr(||Z m || > r). The following 
looser bound also holds 



J^\m,k 2 ,a 2 ) < inf k 2 P + 1 , .. , 

p>e \ v r 



e ^ 2 1 



Remark: The latter loose bound is useful for analytical manipulations when proving explicit bounds on 
the ratio of the upper and lower bounds in Section [Vj 

Proof: Note that because A has a covering radius of r c , ||x™ — Xq 1 || 2 < r 2 . Thus the first stage 
cost is bounded above by ^k 2 r 2 c . A tighter bound can be provided for a specific lattice and finite m (for 

2 

example, for m = 1, the first stage cost is approximately k 2 ^f if r 2 <C cr 2 because the distribution of 
Xq 1 conditioned on it lying in any of the quantization bins is approximately uniform at least for the most 
likely bins). 

For the second stage, observe that 



E 



Xf,Z r ' 



|X™-X? 



m\\2 



E 



E 5 



(3) 



Denote by £ m the event {||Z m || 2 > r 2 }. Observe that under the event X™ = X™, resulting in a zero 
second-stage cost. Thus, 



E 5 



j-j^m X m || 2 |X m 



E z , 



1 - X l II MSm}!^ 

Xm -v"m||2-i Ivm 
l - A i II 



+ E 2 



i -^l II !{^}l x l 



We now bound the squared-error under the error event £ m , when either x™ is decoded erroneously, or 
there is a decoding failure. If x™ is decoded erroneously to a lattice point x™ ^ x™, the squared-error 
can be bounded as follows 



II 



xr - yr + y? - xr ir < (IK - y 2 m n + w - ^rii) < (IKII + r P y . 



If Xj™ is decoded as y™, the squared-error is simply ||z m || 2 , which we also upper bound by (||z m || + r 
Thus, under event £ m , the squared error ||x™ — x™!) 2 is bounded above by (||z m || + r p ) 2 , and hence 



E 



z m 



| X --Xr|| 2 |Xr] < E zm [(||Z m ||+r p ) 2 l {fm} |Xr] 
( = } E zm [(||Z m ||+r p ) 2 l { , m} ], 



(4) 
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Fig. 3. A pictorial representation of the proof for the lower bound assuming ctq = 30. The solid curves show the vector lower bound of [15] 
for various values of observation noise variances, denoted by 0%. Conceptually, multiplying these curves by the probability of that channel 
behavior yields the shadow curves for the particular a%, shown by dashed curves. The scalar lower bound is then obtained by taking the 
maximum of these shadow curves. The circles at points along the scalar bound curve indicate the optimizing value of a a for obtaining that 
point on the bound. 



where (a) uses the fact that the pair (Z m , l{s m }) is independent of X™. Now, let P = ^, so that the first 
stage cost is at most k 2 P. The following lemma helps us derive the upper bound. 
Lemma 1: For a given lattice with r 2 = |j = z |f> the following bound holds 



-E 



rn 



The following (looser) bound also holds as long as P > £ 2 , 



-E 5 



m 



Z"'ll +r p ) l{s m y] < ( 1 + \/ ^ I < - 



mP j_ m + 2 
2 



Proof: See Appendix |I} 
The theorem now follows from ([3]), Q and Lemma [T] 



IV. Lower bounds on the cost 

Bansal and Basar [3] use information-theoretic techniques related to rate-distortion and channel capacity 
to show the optimality of linear strategies in a modified version of Witsenhau sen's counterexample where 
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the cost function does not contain a product of two decision variables. Following the same spirit, in [15] 
we derive the following lower bound for Witsenhausen's counterexample itself. 

Theorem 2: For W(m, k 2 , <7q), if for a strategy j(-) the average power ^Ex™ [||U™|| 2 ] = P, the 
following lower bound holds on the second stage cost 

where (•) + is shorthand for max (•,()) and 



k(P, al) = ^ -= . (5) 



The following lower bound thus holds on the total cost 



'P 

Proof: We refer the reader to [15] for the full proof. We outline it here because these ideas are used 
in the derivation of the new lower bound in Theorem |3] 
Using a triangle inequality argument, we show 



-Ex™,Z" 



|X™-X™|| 2 < wiE x;r ,z™[IW-X™|| 2 ] + J-Ex-z- [||Xf-X? 



m II 2 



(6) 



lower bound on E x ™,z 



The first term on the RHS is y P. It therefore suffices to lower bound the term on the LHS to obtain a 

X™ — X™!! 2 . To that end, we interpret X™ as an estimate for X™, which is a 
problem of transmitting a source across a channel. For an iid Gaussian source to be transmitted across a 
memoryless power-constrained additive-noise Gaussian channel (with one channel use per source symbol), 
the optimal strategy that minimizes the mean- square error is merely scaling the source symbol so that the 
average power constraint is met [47]. The estimation at the second controller is then merely the linear 
MMSE estimation of X™, and the obtained MMSE is k(P, ctq). The lemma now follows from ([6]). ■ 
Observe that the lower bound expression is the same for all vector lengths. In the following, large- 
deviation arguments [48], [49] (called sphere-packing style arguments for historical reasons) are extended 
following [41]-[43] to a joint source-channel setting where the distortion measure is unbounded. The 
obtained bounds are tighter than those in Theorem [2] and depend explicitly on the vector length m. 
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Theorem 3: For W(m, k 2 , a 2 .), if for a strategy 7(-) the average power ^Ex f <™ [||U™|| 2 ] = P, the 
following lower bound holds on the second stage cost for any choice of o 2 G > 1 and L > 

J^\m,k 2 ,a 2 ) >r,(P,a 2 ,a 2 G ,L). 

where 

rj(P, a 2 , a 2 G , L) = ^ exp ~ 1) ^ ^^(P.ag^L) - 7p)" 

where k 2 (P, a 2 ., a G , L) : = 

0>G 



(L) e !-^W (((To + v/P) 2 + d m {L)a 2 G 

r (T\ — 1 - M - ?/;(rr7 f Z^" 1 (T) — Pr(l|Z m+2 ll 2 <mL 2 ) _ 

^m\^) ■— p r (||Z m || 2 <mL 2 ) ~~ ^ yyin,ljy/IU)) , U m yu ) .— p r (|| Z m||2< mi 2) — l_^( m) £ v ^ > 

< d m (L) < 1, and ip{m,r) = Pr(||Z m || > r). Thus the following lower bound holds on the total cost 

J min (m,k 2 } a 2 ) > inf k 2 P + 7](P,a 2 ,a 2 G} L), (7) 

for any choice of > 1 and L > (the choice can depend on P). Further, these bounds are at least as 
tight as those of Theorem [2] for all values of k and <j\. 

Proof: From Theorem [2| for a given P, a lower bound on the average second stage cost is 
(j^y^ — y/P^j j ■ We derive another lower bound that is equal to the expression for r](P,a 2 ,cr G , L). 
The high-level intuition behind this lower bound is presented in Fig. [3j 

Define := {z m : ||z m || 2 < mL 2 a G } and use subscripts to denote which probability model is being 
used for the second stage observation noise. Z denotes white Gaussian of variance 1 while G denotes 
white Gaussian of variance a G > 1. 



J^(x™ z m )fo(*o)fz(z m )dxodz m 
> f (f 4"\K^ m )fo(^)d^)fz(z m )dz m 
= [ (f J^\^,znfo(^)d^)^^J G (z m )dz m . (8) 
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The ratio of the two probability density functions is given by 

/z(z m ) 



IK 

e 2 



2na 2 G 



f G (z m ) (v^) 



Observe that z m e Si, ||z m || 2 < mL 2 a G . Using > 1, we obtain 



mL 2 (ig-l) 



CT G e 



(9) 



Using © and ®, 



E 



Xg\Z r ' 



J 2 W (X™, Z m ) 



ml?" 




-i) 
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mL 2 






2 



^0 > G 



J 2 (7) (x™,z m )/ (x™)dx- I / G (z m )^ 





(7)/ 



E 



-*- » G 



TV1)f~V"m rjm^-i 
J 2 (At) > Zj GJ i {z™ecSG} 

J^ ) (X-,Z-)|Z-G5flPr(Z-G l Sf). (10) 



Analyzing the probability term in pO] ), 

Pr(Z^G5f) = Pr (|| Z£ || 2 < mL 2 ^) = Pr 



I 7"! I 

™) <mi 2 



0"G 



1 - Pr 



0"G 



> mL 2 1=1 — ip(m, Ly/m) 



c m (L)' 



(ID 



because ^ ~ A/"(0,I m ). From (fTOj) and ([11} 



J™(X™,Z^ 



a G e 



mL 2 (<J G -l) 



IE"Y"m 
A ' G 



4 7) (X- Z£)|Z£e<Sf (l-^KWro)) 



C m (L) 



-E, 



J^(x m ,z-)|z-e5f 



(12) 



We now need the following lemma, which connects the new finite-length lower bound to the infinite-length 
lower bound of [15]. 
Lemma 2: 



^0 ' G 



J™(X™, Z™)|Z™ g 5f > ( ( v/k 2 (P, a 2 , a 2 ,, L) - VP 



for any L > 0. 

Proof: See Appendix [nj 
The lower bound on the total average cost now follows from ( fT2] ) and Lemma [2] 
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We now verify that d m (L) £ (0, 1). That d m (L) > is clear from definition. d m (L) < 1 because 
{z m+2 : ||z m+2 || 2 < mLVg} C {z m+2 : ||z m || 2 < mL 2 a G }, /.<?., a sphere sits inside a cylinder. 

Finally we verify that this new lower bound is at least as tight as the one in Theorem |2j Choosing 
<Jq = 1 in the expression for rj(P, (Tq,(t g ,L), 



r)(P, a 2 , <T 2 G , L) > sup ( (jn 2 (P,a 2 ,l,L) - y/p) 

l>o c m [ij) y \ v j 



Now notice that c m (L) and d rn (L) converge to 1 as L — > oo. Thus k 2 (P, <Tq, 1, L) k(P, ctq) and 

therefore, ?y(P, (Tq, cr^, L) is lower bounded by ^(v^ — VP^j J , the lower bound in Theorem 

V. Combination of linear and lattice-based strategies attain within a constant 

FACTOR OF THE OPTIMAL COST 

Theorem 4 (Constant-factor optimality): The costs for W(m, k 2 , ctq) are bounded as follows 

inf sup k 2 P + t}(P, (Jq, ctq, L) < J m in(m, fc 2 , ctq) < fi inf sup k 2 P + rj(P, ctq, a G , L) , 

P ^°4>l,L>0 \ k P ^°<^>l,Z,>0 / 

where /i = 100£ 2 , ^ is the packing-covering ratio of any lattice in M m , and r)(-) is as defined in Theorem[3] 
For any m, /x < 1600. Further, depending on the (m, k 2 , erg) values, the upper bound can be attained by 
lattice-based quantization strategies or linear strategies. For m — 1, a numerical calculation (MATLAB 
code available at [50]) shows that /i < 8 (see Fig. [5]). 

Proof: Let P* denote the power P in the lower bound in Theorem [3] We show here that for any 
choice of P*, the ratio of the upper and the lower bound is bounded. 

Consider the two simple linear strategies of zero-forcing (u™ = — x™) and zero-input (u™ = 0) followed 
by LLSE estimation at C 2 . It is easy to see [15] that the average cost attained using these two strategies 

2 

is k 2 o\ and < 1 respectively. An upper bound is obtained using the best amongst the two linear 
strategies and the lattice-based quantization strategy. 
Case P. P* > ^. 

2 

The first stage cost is larger than k 2 ^. Consider the upper bound of k 2 a^ obtained by zero-forcing. The 
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Fig. 4. The ratio of the upper and the lower bounds for the scalar Witsenhausen problem (top), and the 2-D Witsenhausen problem (bottom, 
using hexagonal lattice of £ = -j=) for a range of values of k and ero- The ratio is bounded above by 17 for the scalar problem, and by 
14.75 for the 2-D problem. 



ratio of the upper bound and the lower bound is no larger than 100. 

Case 2: P* < ^ and a 2 < 16. 
Using the bound from Theorem [2] (which is a special case of the bound in Theorem [3]), 

a 2 ( p *<^) a 2 



> 



(7 + VP*) 2 + 1 



(<7g<16) 

> 



Thus, for a 2 < 16 and P* < ^, 



(Tn 



(Tn 



> 



<7n 



16 1 



'100 J 



Y + 1 20.3(> - 21 



> ((^-Vp* 



0.014a 2 > 



vToo, 

2 

Using the zero-input upper bound of -Sj, the ratio of the upper and lower bounds is at most -Pr < 72. 



Case 5: P* < ^,trg > 16, P* < \. 



In this case, 



(Tn 



(a) 
> 



((7 + VP*) 2 + 1 

16 



(p*<§) 
> 



(Tn 



(V16 + V0.5) 2 + 1 



((T + V0.5) 2 + 1 
0.6909 > 0.69, 
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Fig. 5. An exact calculation of the first and second stage costs yields an improved maximum ratio smaller than 8 for the scalar Witsenhausen 
problem. 

where (a) uses cr% > 16 and the observation that , , x ^ 2 , . = -. — is an increasing function of x for 

v i o — 0+6) 2 +i (i+ky+j^ b 

x, b > 0. Thus, 

((v^- v / P) + ) 2 > {{Vom - v / 05) + ) 2 « 0.0153 > 0.015. 

2 

Using the upper bound of < i ? the ratio of the upper and the lower bounds is smaller than < 67. 
Case 4: a 2 > 16, | < P* < 



Using L = 2 in the lower bound, 



1 



1 



(Markov's ineq.) 
< 



Pr(||Z m || 2 < ml 2 ) 1 - Pr(||Z m || 2 > ml 2 ) 

1 (L=2) 4 

1 - ~ 3' 



Similarly, 



(2) 



Pr(||Z m+2 || 2 < mL 2 ) 
Pr(||Z m || 2 < mi 2 ) 



> 



PrfllZ 



m+2||2 



< mL 2 



1 - PrfllZ 



m+2 i|2 



> mL 2 ) 



(Markov's ineq.) m + 2 1 + — ( m >l) 3 1 

> 1 — - = 1 ^ > 1 - - = -. 

mL 2 4 4 4 
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In the bound, we are free to use any Uq > 1. Using o 2 G = 6P* > 1, 

2 2 

k 2 : 



(a) 
> 



6PV 2 , 



K + f§) 2 



ioo y \3J 



3 
C4 



(m>l) 

> 1.255P*. 



where (a) uses cr| = 6P*,P* < ^,c m (2) < f and 1 > d m (2) > \. Thus, 



((v 7 ^ - v^) + y > P*(Vl.255 - l) 2 > 



70' 

Now, using the lower bound on the total cost from Theorem [3j and substituting L = 2, 

2/^2 



> 


k 2 P* + 


(<4=6P*) 




> 


k 2 P* + 


(a) 




> 


k 2 P* + 


(m>l) 




> 


k 2 P* + 


> 


A; 2 P* + 



G 



exp 



mL 2 f^- 1) 



Cm(2) V ^ 

>*^m / 4 m (6P*-l)\ P 

exp ' 



7^- VP* 



(6P* 



Cm (2) 

e 2m e 



70 



om 1 

d ,2m -12P*m - 1 



■l 
3 

3 x 3 x e 2 

4 x 70 x 2 

\ -\2mP* 

9 



70 x 2 



-12mP* 



(13) 



(14) 



where (a) uses c m (2) < | and P* > |. We loosen the lattice-based upper bound from Theorem [T] and 



bring it into a form similar to (14). Here, P is a part of the optimization: 



Jmin( m i k , <7q) 



< 

< 
< 

(m>l) 



inf k 2 P+ 1 + J— e ^ + 2 1 
inf fc 2 P+ i e -'i r+ ^( 1+ln (^)) +21n ( 1+ \/?) +ln(9) 

p>e 9 

mf fc 2 P + -e 2 ™ 
p>£ 2 9 



1 0.12mP 

inf P + -e e 2 x e 

p>£ 2 9 



1 0.12mP m / 0.38P 3 



< inf fc 2 P + -e e 2 e 

p>e 9 

„ | _ 0.12mP 

< inf A; 2 P + -e 

P>34£ 2 9 



m(^-^(l+ln(^))-£ln( 1+ ^ 
-f(l+ln(^))-21n(l+^)-ln(9) 



m / 



(15) 
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where the last inequality follows from the fact that ^|§^ > § + In (jtjj + 2 m (l + «/|JJ + In (9) 
for > 34. This can be checked easily by plotting it Q 
Using P = 100£ 2 P* > 50£ 2 > 34£ 2 (since P* > \) in $U 



J min (m,k 2 ,a 2 ) < kHOO^P* + \e "' 

y 



k 2 100£ 2 P* + ^e- 12mP *. (16) 



Using (14) and (16), the ratio of the upper and the lower bounds is bounded for all m since 



k 2 100 £2p* + l e -12mP* k 2 100e 2 P* o 

For m = 1, £ = 1, and thus in the proof the ratio // < 100. For m large, £ « 2 [46], and /i < 400. For 
arbitrary m, using the recursive construction in [51, Theorem 8.18], £ < 4, and thus ^ < 1600 regardless 
of m. ■ 
Though the proof above succeeds in showing that the ratio is uniformly bounded by a constant, it is not 
very insightful and the constant is large. However, since the underlying vector bound can be tightened 
(as shown in [32]), it is not worth improving the proof for increased elegance at this time. The important 
thing is that such a uniform constant exists. 

A numerical evaluation of the upper and lower bounds (of Theorem [T] and [3] respectively) shows that 
the ratio is smaller than 17 for m — 1 (see Fig. [4]). A precise calculation of the cost of the quantization 
strategy improves the upper bound to yield a maximum ratio smaller than 8 (see Fig. [5]). 

A simple grid lattice has a packing-covering ratio £ = \fm. Therefore, while the grid lattice has the 
best possible packing-covering ratio of 1 in the scalar case, it has a rather large packing covering ratio 
of y/2 (« 1.41) for m = 2. On the other hand, a hexagonal lattice (for m = 2) has an improved packing- 
covering ratio of ^= 1.15. In contrast with m = 1, where the ratio of upper and lower bounds of 
Theorem [T] and [3] is approximately 17, a hexagonal lattice yields a ratio smaller than 14.75, despite having 

It can also be verified symbolically by examining the expression g(b) = 0.38b 2 - § (1 + In b 2 ) - 2 ln(l + 6) - In (9), taking its derivative 
g'(b) = 0.76b - | - j^, and second derivative g" (b) = 0.76 + ^ + (1 ^ b)2 > 0. Thus g(-) is convex-U. Further, g'{VM) w 3.62 > 0, 



and ff(V34) w 0.09 and so g(6) > whenever 6 > V34 
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a larger packing-covering ratio. This is a consequence of the tightening of the sphere-packing lower bound 
(Theorem [3]) as m gets largej^J 

VI. Discussions of numerical explorations and Conclusions 

Though lattice-based quantization strategies allow us to get within a constant factor of the optimal 
cost for the vector Witsenhausen problem, they are not optimal. This is known for the scalar [5] and 
the infinite-length case [15]. It is shown in [15] that the "slopey-quantization" strategy of Lee, Lau and 
Ho [5] that is believed to be very close to optimal in the scalar case can be viewed as an instance of a 
linear scaling followed by a dirty-paper coding (DPC) strategy. Such DPC -based strategies are also the 
best known strategies in the asymptotic infinite-dimensional case, requiring optimal power P to attain 
asymptotic mean-square error in the estimation of x™, and attaining costs within a factor of 1.3 of 
the optimal [32] for all (A;, Oq). This leads us to conjecture that a DPC-like strategy might be optimal 
for finite-vector lengths as well. In the following, we numerically explore the performance of DPC-like 
strategies. 

It is natural to ask how much there is to gain using a DPC-based strategy over a simple quantization 
strategy. Notice that the DPC-strategy gains not only from the slopey quantization, but also from the 
MMSE-estimation at the second controller. In Fig. [6} we eliminate the latter advantage by considering first 
a uniform quantization-based strategy with an appropriate scaling of the MLE so that it approximates the 
MMSE-estimation performance, and then the actual MMSE-estimation strategy for uniform quantization. 
Along the curve kao = y/TO, there is significant gain in using this approximate-MMSE estimation over 
MLE, and further gain in using MMSE-estimation itself. This also shows that there is an interesting 
tradeoff between the complexity of the second controller and the system performance. 

From Fig. [6j along the curve ka = VlO, the DPC-based strategy performs only negligibly better than 
a quantization-based strategy with MMSE estimation. Fig. [7] (a) shows that this is not true in general. A 

'indeed, in the limit m — > oo, the ratio of the asymptotic average costs attained by a vector-quantization strategy and the vector lower 
bound of Theorem|2|is bounded by 4.45 [15]. 
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-0.4 -0.2 0.2 0.4 0.6 0.8 1 1.2 



lo 9l0 (a ) (ka =10 - 5 ) 



Fig. 6. Ratio of the achievable costs to the scalar lower bound along kao = 10 -0 ' 5 for various strategies. Quantization with MMSE-estimation 
at the second controller outperforms quantization with MLE, or even scaled MLE. For slopey-quantization with heuristic DPC-parameter, 
the parameter a in DPC-based scheme is borrowed from the infinite-length analysis. The figure suggests that along this path (kao = vlO), 
the difference between optimal-DPC and heuristic DPC is not substantial. However, Fig. ^] (b) shows that this is not true in general. 




Fig. 7. (a) shows the ratio of cost attained by linear+quantization (with MMSE decoding) to DPC with parameter a obtained by brute-force 
optimization. DPC can do up to 15% better than the optimal quantization strategy. Also the maximum is attained along k ~ 0.6 which is 
different from k — 0.2 of the benchmark problem [5]. (b) shows the ratio of cost attained by linear+quantization to DPC with a borrowed 
from infinite-length optimization. Heuristic DPC does not outperform linear+quantization (with MMSE estimation) substantially. 
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DPC-based strategy can perform up to 15% better than a simple quantization-based scheme depending 
on the problem parameters. Interestingly, the advantage of using a DPC-based strategy for the case of 
k = 0.2, a = 5 (which is used as the benchmark case in many papers, e.g. [5], [8]) is quite small. The 
maximum gain of about 15% is obtained at k w 10~ a2 ~ 0.63, and a = 1 (and indeed, any a > 1. In 
the future, we suggest the community use the point (0.63, 1) as the benchmark case. 

Given that there is an advantage in using a DPC-like strategy, an interesting question is whether the 
DPC parameter a that optimizes the DPC-based strategy's performance at infinite-lengths (in [15]) gives 
good performance for the scalar case as well. Fig. [7] (b) answers this question at least partially in the 
negative. This heuristic-DPC does only slightly better than a quantization strategy with MMSE estimation, 
whereas other values of a do significantly better. 

Finally, we observe that while uniform bin-size quantization or DPC-based strategies are designed 
for atypical noise behavior, atypical behavior of the the initial state is better accommodated by using 
nonuniform bin-sizes (such as those in [5], [8]). Table [I] compares the two. Clearly, the advantage in 
having nonuniform slopey-quantization is small, but not negligible. It would be interesting to calibrate 
the advantage of nonuniform-bin sizes for (k, cr ) = (0.63, 1), a maximum gain point for uniform-bin size 
slopey-quantization strategies. 

TABLE I 

Costs attained for the benchmark case of k = 0.2, a = 5. 





linear+quantization 


Slopey-quantization 


Lee, Lau and Ho [5] 
Li, Marden and Shamma [8] 
This paper 


0.1713946 
0.1715335 


0.1673132 
0.1670790 
0.1673654 



There are plenty of open problems that arise naturally. Both the lower and the upper bounds have 
room for improvement. The lower bound can be improved by tightening the vector lower bound of [15] 
(one such tightening is performed in [32]) and obtaining corresponding finite-length results using the 
sphere -packing tools developed here. 
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Tightening the upper bound can be performed by using DPC-based techniques over lattices. Further, 
an exact analysis of the required first-stage power when using a lattice would yield an improvement (as 
pointed out earlier, for m — 1, ^k 2 r 2 overestimates the required first-stage cost), especially for small m. 
Improved lattice designs with better packing-covering ratios would also improve the upper bound. 

Perhaps a more significant set of open problems are the next steps in understanding more realistic 
versions of Witsenhausen's problem, specifically those that include costs on all the inputs and all the 
states [13], with noisy state evolution and noisy observations at both controllers. The hope is that solutions 
to these problems can then be used as the basis for provably-good nonlinear controller synthesis for larger 
distributed systems. Further, tools developed for solving these problems might help address multiuser 
problems in information theory, in the spirit of [52], [53]. 
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Appendix I 
Proof of LemmaQ] 



E zm [(||Z"1 + r p ) 2 l {£m} ] = E zm [||Z"1 2 1{^}] +rJPr(£ m ) + 2r p E z ™. [(l {£m} ) (||Z™||l {£m} )] 

< E Zm [\\Z m \\ 2 l {£m} ] +r 2 p Pr(£ m ) + 2r p ^E Zm [l {£m} ] ^E Zm [\\Z m \\ 2 l { s m} ] 
= (^/E zm [\\Z m \\ 2 l {£m} ]+r p y/PT(sS) , (18) 

where (a) uses the Cauchy-Schwartz inequality [54, Pg. 13]. 

_ l|z m || 2 

We wish to express E zm [||Z™|| 2 l {£ro} ] in terms of V(m,r p ) := Pr(||Z m || > r p ) = jj |zm|| > rp f^^dz m . 
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Denote by A m (r) :- 



r t 



the surface area of a sphere of radius r in IR m [55, Pg. 458], where T( 



is the Gamma-function satisfying T(m) = (m — l)T(m — 1), T(l) = 1, and r(|) = y/n. Dividing the 
space M. m into shells of thickness dr and radii r, 



E 5 



rm II 2-i 



Using (18), (19), and r 



||z m ||>r p 



e 2 



r>r p (v^Vr) 



2 e r 2 27r™r m 1 , 

r ; ; rr, 7 7 CLT 



>r P (V^) m r(f) 

e"^27T 27T I!i 2 tl r m+1 



dr = rmfj(m + 2, r p ). 



(19) 



E zm [(\\Z m \\ +r p ) 2 l {£m} ] < m ( ^(m + 2,r p ) + \l ^\]ip(m,r p ) 

which yields the first part of Lemma [I] To obtain a closed-form upper bound we consider P > £ 2 . It 
suffices to bound ip {■,■). 



ip(m,r p ) 



Pr(||Z m || 2 > r 



Pr(exp(p^Z 2 ) > exp(pr 2 )) 



i=i 



(a) 



exp 



e -^=E Zl [exp(pZ 2 )]"V^ 



2\l m -pr 2 (for 0<p<0.5) 



;i-2p) ; 



where (a) follows from the Markov inequality, and the last inequality follows from the fact that the 
moment generating function of a standard \\ random variable is — - — r for p e (0,0.5) [56, Pg. 375]. 



(l-2p)2 



Since this bound holds for any p 6 (0, 0.5), we choose the minimizing p* — | ( 1 
p* is indeed in (0, 0.5) as long as P > £ 2 . Thus, 



Since r 



2 _ mP 



i()(m,r p ) < 



1 



(l-2p*)f 



-P r v 



I / 1_m 1-2 



3 r p 



1 + f + f In 



Using the substitutions r 2 = mP, £ = ^ and r 2 = z ||^, 



Pr(£ m ) = ip(m,r p ) = \ m, 



'mP 



< e 2 « 2 



m i m i„ 



and 



(20) 



. I TYl P \ mP i m + 2 i m + 2 n „ 

E Z m [||Z m || 2 l {£m} ] <mip m + 2,4/— I < ^+2 + 2 ln V ( 



(21) 
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From (pi pOl) and (|2T|), 



E 5 



Z m \\+r p ) 2 l {£m} ] < (Vmc ^ 



iP , m + 2 i m + 2 



hi 



(m + 2)^ 



(since P>£ 2 ) 
< 



mP | m+2 I m + 2 



In 



mP | m + 2 | m + 2 



Appendix II 
Proof of Lemma[2] 



in 



The following lemma is taken from [15]. 

Lemma 3: For any three random variables A, B and C, 



E[\\B-C\\ 2 } > ((VE [\\A - C|| 2 ] - [\\A - S|| 2 ]) + ^ 



Proof: See [15, Appendix II]. 
Choosing A = X™, 5 = X™ and C = X™ 



^0 '^G 



j( 7) (X™, Z™)|Z™G<S^ 



-E 



7?? 



a ,z, G 



|-j£m X m ||^|Z m G 

I 1 1 1 1 I G L 



> 



-E, 



m 



Xm Y"i|l2l7m £- CG . / n? NlYm Y"i|l2l'7m c CGl 

~ A l II I^G L ~ A/ — ^xy.Zg [11^0 ~~ A l II I^G E ^Ll 



m 



+\ 2 



-E, 



m 



| X m _xm||2| Z m £ £G 



-VP 



(22) 



since X™ - Xf = U^ 1 is independent of Zg and E [||U^|| 2 ] = mP. Define := X™ + Z^ to be the 
output when the observation noise Z m is distributed as a truncated Gaussian distribution: 



fzM 



,m ii 2 



C m {L) 




(23) 



otherwise. 

Let the estimate at the second controller on observing y r £ be denoted by X™. Then, by the definition of 
conditional expectations, 



Ei 



~ A l II I^G fc L 



E, 



(24) 
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To get a lower bound, we now allow the controllers to optimize themselves with the additional knowledge 
that the observation noise z m must fall in S^. In order to prevent the first controller from "cheating" and 
allocating different powers to the two events {i.e. z m falling or not falling in Sf), we enforce the constraint 
that the power P must not change with this additional knowledge. Since the controller's observation X™ is 
independent of Z m , this constraint is satisfied by the original controller (without the additional knowledge) 
as well, and hence the cost for the system with the additional knowledge is still a valid lower bound to 
that of the original system. 

The rest of the proof uses ideas from channel coding and the rate-distortion theorem [57, Ch. 13] from 
information theory. We view the problem as a problem of implicit communication from the first controller 
to the second. Notice that for a given r y{-), X™ is a function of X™, Y™ = X™ + Z™ is conditionally 
independent of X™ given X'" (since the noise Z™ is additive and independent of X™ and X™). Further, 
X™ is a function of Y™. Thus X™ — X™ — Y™ — X™ form a Markov chain. Using the data-processing 
inequality [57, Pg. 33], 

J(X™;X™)</(Xr;Y™), (25) 

where I(A, B) is the expression for mutual information expression between two random variables A and 
B (see, for example, [57, Pg. 18, Pg. 231]). To estimate the distortion to which X™ can be communicated 
across this truncated Gaussian channel (which, in turn, helps us lower bound the MMSE in estimating 



X^), we need to upper bound the term on the RHS of ( [25] ) 
Lemma 4: 

m J(Xl ' Yi) " 2° g2 ^ 
Proof: We first obtain an upper bound to the power of X™ (this bound is the same as that used 

in [15]): 

e x ™ [iixni 2 ] = e x ™ oi x ™ + urn 2 ] = ^ Own 2 ] + [\m\ 2 ] + ^ [k iT ^t] 

(a) 



< E x ™ [IWII 2 ] + E X ™ [||Un| 2 ] +2^Exg. [IWIH^llx-. [||U? 

< m^o + v/p) 2 , 



m||21 
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where (a) follows from the Cauchy-Schwartz inequality. We use the following definition of differential 
entropy h(A) of a continuous random variable A [57, Pg. 224]: 



h(A) = - f A (a) log 2 (f A (a)) da, 
Js 



(26) 



where f A (a) is the pdf of A, and S is the support set of A. Conditional differential entropy is defined 
similarly [57, Pg. 229]. 

Let P : = ((T + VP) 2 . Now, E [Y 2 ^] = E [XjJ + E [Z 2 -] (since X lyi is independent of Z Lyi and 
by symmetry, Z L j are zero mean random variables). Denote Pi = E [Xf ^ and cr^ = E \Z\^. In the 
following, we derive an upper bound C { q \ on ^/ (X™; Y™). 



(m) 
G,L 



(a) 



(b) 



(c) 



(d) 
< 



^ su p - E o log2 ( 27re(i ^ + ^) - ^ (zr) 

< 1 log 2 (27re(P + d m (L)a 2 G )) - ^(7%). 



p(Xf): 


sup 

E[||Xf 


-/(X^Y?) 

2]<mP m 


p(Xy): 


sup 

E[||Xf 


-/i(Y™) - -/i(Y™|X™) 
2 ]< m P™ ™ 


p(Xf): 


sup 

E[||Xf 


— h(Y™) - —h(X? + Z™|X™) 
2 ]< m pm 1 m 


p(Xf): 


sup 

E[||Xf 


-/i(Y™) - -/i(Z?|X™) 

2 ]< m P™ ™ 


p(XJ»): 


sup 

E[||Xf 


Ift(YT) - -MZD 


sup 

p(Xf):E[||Xf 


m 1 

2l^ 5 TO ^ 
2 j<mP j = i 



(27) 



Here, (a) follows from the definition of mutual information [57, Pg. 231], (b) follows from the fact that 
translation does not change the differential entropy [57, Pg. 233], (c) uses independence of Z™ and X™, 
and (d) uses the chain rule for differential entropy [57, Pg. 232] and the fact that conditioning reduces 
entropy [57, Pg. 232]. In (e), we used the fact that Gaussian random variables maximize differential 
entropy. The inequality (/) follows from the concavity-fl of the log(-) function and an application of 
Jensen's inequality [57, Pg. 25]. We also use the fact that ^ Y^T=i a G,i = d m (L)aQ, which can be proven 
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as follows 



-E 



m 



i=l 



(using_|23)) G G 

m 



w m\\2 



exp 



ll«Sll a 



2(T7 



c m (L)cr G 



m UZ| G 



E 



rn 



rm II 2-i 



mi || 2 



lfuS 



{l|Z m ||<v / mL 2 } 



c m (L)a G 



E 



rm || 2 



E 



f m || 2 



Ins 



(usin^{T9)) c m (L)a G 



m 



m — mtfj(m + 2, V mL 2 ) 



c m (L) (l - ^(m + 2, Ly/m)) a G = d m (L) 



)cr G . 



(28) 



5?) 



We now compute ft,(Z' 

&(Z£) = / /^(z m )log 2 



/z,(z m ) 



z m e5f 



/z L (z m )log 2 



/ 27T0| 



dz r 



,c m {L)e 



m f llz m ll 2 

log 2 (c m (L)) + -log 2 (27r4)+ / c m (L)/ G (z m )^— ^log 2 (e)dz m (29) 



Analyzing the last term of (29) 



,m||2 



Cm (L)f G (z m )^-\og 2 (e)dz' 



G 





log 2 (e) 

2^ 


- / c m (L) 




log 2 (e) 
2o* a 


- [ fz L (z m )\\ 

Jz m 


(using (23}) 


log 2 (e) 
2a G 


-Eg [llZril 2 ] = 


(using_|2§) 


log 2 (e) 


-md m (L)a G = 




2^ 



,m || 2 j„m 



27T(7 G 



rfz T 



log 2 (g) 
2a G 



E. 



G 



log 2 (e dm ^) 



(30) 



The expression C^l can now be upper bounded using ( [27] ), ( [29] ) and ( [30] ) as follows. 



< \ log 2 (2vre(P + + ^ log 2 (c m (L)) - ± log 2 (2tt^) - ^ log 2 (e*»^) 



1 



1 



1 



^ log 2 (2vre(P + d m {L)a 2 G )) + ^ log 2 (J (L)) - i log 2 (2tt4 ) - i log 2 (e*^) 



1. /27re(P + d m (L)a 2 )c^(L)\ 1 / e i-M^)(p + d m (L)a 2 G )c^t(L) 

lo S2 1 o_-2 z£m I = 2 log 2 1 5 I • (3D 



2na G e dm ^ 
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Now, recall that the rate-distortion function D m (R) for squared error distortion for source X™ and 
reconstruction X™ is, 

1 



D m (R) 



inf 

p(X™|X™) 
i/(X^;X™)<i? 



-E 



m 



~srm r 7m 
X ' G 



Xm vm II 2 
L II 



(32) 



which is the dual of the rate-distortion function [57, Pg. 341]. Since /(X™; X™) < tjiCqI, using the 
converse to the rate distortion theorem [57, Pg. 349] and the upper bound on the mutual information 
represented by Cq^, 



1 



-E 



m 



A 



| X m || ^ 



> D m (Cg )). 



(33) 



Since the Gaussian source is iid, D m (R) = D(R), where -D(-R) = <7q2 2R is the distortion-rate function 
for a Gaussian source of variance o-q [57, Pg. 346]. Thus, using ([22]), ( [24] ) and ( [33] ), 

j 2 (7) (x™, z m )|z m e 5f 



]E"Y" m 7m 



> 



D(C t 



(™)\ 



Substituting the bound on from ( [31) , 



d \ c g,l) = a o 2 °' L 



c^{L)e^(L)(p + d m {L)a t 



G) 



Using ([22]), this completes the proof of the lemma. Notice that c m (L) — > 1 and d m (L) — > 1 for fixed m 
as L — >• oo, as well as for fixed L > 1 as m — > oo. So the lower bound on D(CqJ) approaches k of 
Theorem [2] in both of these two limits. 
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